Professionalism and High-Stakes Tests: 
Teachers' Perspectives When Dealing 
With Educational Change Introduced 
Through Provincial Exams 

Carolyn E. Turner 


The effect of high-stakes tests on classroom activity (commonly called washback) 
is an issue that is receiving heightened attention in the literature. It is yet one 
more element that teachers need to deal with in their professional contexts. This 
article focuses on the perspectives ofESL secondary teachers as they experience 
curriculum innovations introduced into the educational system via provincial 
exams. Survey results from 153 teachers are reported. The survey is part of a 
larger washback study that also triangulated classroom observation and teachers' 
and students' perception data in a longitudinal study. The survey results suggest 
that teachers would like to do their part in moving the system into a position 
where curriculum, their teaching and assessment, and the system's high-stakes 
exam correspond. They achieve this, however, according to their beliefs and 
professional stances, which may not present a unified performance across teach- 
ers. 

Les chercheurs se penchent davantage sur I'effet qu'ont les examens a enjeux 
eleves sur I'activite en salle de classe (connu sous le nom de saut arrihe). II s'agit 
d' encore un autre Bement dont doivent tenir compte les enseignants dans le cadre 
de leur travail. Cet article porte sur les perspectives d' enseignants d'ALS au 
secondaire qui sont confrontes a des innovations aux programmes scolaires 
introduites par les examens du ministere. line enquete a ete entreprise aupres de 
153 enseignants dans le contexte d'une Hude longitudinale portant sur le saut 
arrihe oil Ton a triangule des donnees decoulant d' observations en salle de classe, 
d'une part et des perceptions des enseignants et des eleves, d'autre part. Les 
resultats donnent a penser que les enseignants aimeraient contribuer a faire 
evoluer le systeme de sorte a faire correspondre leur enseignement et leur evalua- 
tion, les programmes scolaires, et les examens a enjeux Beves. Toutefois, la 
contribution des enseignants reposerait sur leurs croyances et leurs attitudes 
professionnelles, ce qui pourrait ne pas etre uniforme d'un enseignant a V autre. 


Introduction 

Second-language teachers have much to contend with serving as profes- 
sionals in an ever-changing context of sfudenf populafions, curriculum, and 


54 


CAROLYN E. TURNER 





classroom practice. One further element to deal with, which is receiving 
heightened awareness in the recent literature, is the impact of high-sfakes 
fesfs on classroom acfivify. 

This arficle focuses on ESL secondary feachers as fhey experience cur- 
riculum irmovafions infroduced info fhe educational sysfem via provincial 
exams. If reporfs on fhe perspecfives of feachers as professionals in fhis 
sifuafion. The resulfs reporfed here are parf of a larger longifudinal sfudy 
enfifled Investigating high-stakes test impact at the classroom level. The general 
research quesfion for fhe larger sfudy is: Does fhe involvemenf of fhe Minis- 
fry of Educafion, feachers, and sfudenfs af various sfages of fhe fesfing cycle 
make a difference in promoting beneficial washback in ferms of feaching 
mefhodology and confenf, classroom fesfing mefhodology and confenf, par- 
ficipanf perceptions, and sfudenf learning sfrafegies? OR, is negafive impacf 
observed? Analysis of fhe dafa is ongoing. To dafe only fhe initial research 
design and preliminary findings have been presenfed (Turner, 2002, 2005). 
The sources of dafa were classroom observations, parficipanf inferviews, 
feacher discussions, case-sfudy quesfiormaires, and a program-wide feacher 
survey. 

This arficle specifically focuses on a program-wide feacher survey and 
deals wifh fwo concepfs: washback and professionalism. The characferisfics 
of fheir relationship emerged from fhe dafa and are reporfed here fhrough an 
analysis of feacher quesfiormaire resulfs. Before going any furfher, explana- 
tions and definitions are in order. 

The phenomenon of fhe influence of fesfs on classroom acfivify is com- 
monly referred fo as washback. In educational sysfems, washback can affecf 
sfudenfs, feachers, parenfs, and minisfries of educafion and ofher sfake- 
holders. 

One form of washback is relafed fo irmovafion fheory (Wall, 2000). 
Various acfions and consequences may occur when an educational sysfem 
wanfs fo make changes (innovafions) fo a program. There are many ways fo 
go abouf fhis. Eor example, a new official curriculum or program can be 
developed and presenfed. Anofher way (which may happen while waiting 
for a new curriculum fo become official) can be fo infroduce fhe new proce- 
dures or confenf info fhe sysfem fhrough high-sfakes exams. This is done in 
fhe hope fhaf feachers will change or align fheir insfrucfional pracfices fo 
correspond fo fhe exam maferials and mefhodology. Teacher informafion 
sessions are somefimes offered fo help wifh fhis process. Henrichsen (1989) 
discusses employing high-sfakes fesfs in fhis marmer as one way fo enhance 
reform in a sysfem. Drawing on general and language educafion liferafure, 
Andrews (2004) discusses in defail fhe relafionship between washback and 
curriculum irmovation. 

In this article, the specific definifion of washback is fhe exfenf fo which fhe 
fesf influences language feachers and sfudenfs fo do fhings fhey would nof 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


55 



necessarily otherwise do (Alderson & Wall, 1993). In other words, the effects 
are only washback evidence if fhey can be linked fo fhe infroducfion and use 
of fhe fesf (Messick, 1996). The ferms washback and test impact are used 
inferchangeably, alfhough some places in fhe liferafure make a clear disfinc- 
fion befween washback on local effecfs and fesf impacf on sociefal effecfs 
(McNamara, 1998) 

A precise definifion of fhe second ferm, professionalism, remains elusive in 
fhe liferafure. As sfafed in fhe recenf special issue of TESL Canada Journal 
(2004), if seems fo be a complex consfrucf wifh little academic liferafure 
(Mafhews & Chunfian, 2004, p. i). If one looks furfher, however, definitions 
do appear fhaf are specific fo a confexf or sfudy. In combination, fhey begin 
fo provide a clear picfure. Mafhews and Chunfian use fhe Canadian Oxford 
Dictionary (1998) definifion, "fhe skill or qualify required or expecfed of 
members of a profession ... one fhaf involves some branch of learning or 
science." Englund (1996) includes fhe imporfance of requisife fraifs and 
functions (e.g., feacher framing, ongoing professional developmenf), buf 
emphasizes fhe infemalizafion of fhese evenfs as individual feacher charac- 
ferisfics of professionalism: "The infernal qualify of feaching" (p. 77). 
Hedgcock (2002) expands on fhis and focuses on fhe reflecfive nafure of 
feachers, viewing professionalism as fhe abilify fo fhink crifically abouf 
practice, rafher fhan relying on mechanical feaching sfrafegies and mefhods. 
Kumaravadivelu (2003) sees feacher programs as being responsible for creaf- 
ing a climafe of professionalism and for helping fo develop feachers fo 
"acquire fhe necessary knowledge, skill, aufhorify, and autonomy fo con- 
sfrucf personal pedagogic knowledge" (p. 42). In fhe liferafure, profes- 
sionalism does nof appear fo fake on a unified definifion; insfead, one is able 
fo weave fogefher a mulfifacefed concepf from specific insfances. Af fhe end 
of fhis arficle, fhe concepf is revisited and expanded in fhe lighf of whaf 
characferisfics emerged from fhe dafa in relafion fo feacher perspecfives on 
dealing wifh educafional change inf reduced fhrough provincial exams. 

I firsf provide background on fhe concepf of washback in general and in 
second-language (L2) education in parficular. This will include fhe necessify 
fo hear fhe voices of feachers, who are main sfakeholders, concerning fheir 
sfudenfs' performance on externally developed high-sfakes exams. I nexf 
mention a longifudinal sfudy in fhe province of Quebec on washback af fhe 
classroom level and specifically reporf and discuss fhe resulfs perfaining fo 
ESL secondary feachers' perspecfives when dealing wifh provincial exams. I 
conclude wifh an expanded definifion of professionalism, reference fo fhis 
survey in fhe larger scheme of educafional change, and fo fhe imporfanf role 
fhaf feachers as professionals can play. 


56 


CAROLYN E. TURNER 



Background 

An overview of studies demonstrates that the concept of washback is highly 
complex in nature, contextually bound, and that the stakeholders (e.g., teach- 
ers, students, administrators, ministries of education, etc.) appear to be in- 
fluenced differentially (see Cheng, Watanabe, & Curtis, 2004, for a 
comprehensive overview; and Alderson & Wall, 1993, for initial h 5 ^otheses 
concerning washback). We are also learning that there are diverse aspects of 
this phenomenon depending on the sociocultural, sociopolitical, and contex- 
tual factors involved, and in addition, depending on the participants in- 
volved (Turner, 2001b). We are reminded in the literature that "testing is 
never a neutral process and always has consequences" (Stobart, 2003, p. 140). 
Possibly for this reason, the terms positive and negative have become as- 
sociated with washback. Bailey (1996) claims that any test (whether "valid" 
for its purpose or not) can have either positive or negative washback (conse- 
quences) depending on whether it enhances or hinders educational innova- 
tion and goals. 

This brings us to the set of relationships (intended and unintended, posi- 
tive and negative) across curriculum, teaching and learning, and testing (Fox, 
2004). As Pellegrino, Chudowsky, and Glaser (2001) point out, the ideal 
situation is cohesion across curriculum, instruction, and assessment. This 
appears easier said than done when one examines educational systems and 
teachers' behavior and beliefs (Turner, 2002). From a teacher's position, the 
impact of a high-stakes external test can affect classroom activity in various 
ways. For example, if such a test represents the curriculum well, and a 
teacher is teaching the curriculum, then teaching with the general test con- 
cepts in mind and preparing students for the test is positive. If a teacher was 
not focusing on the curriculum, but then became aware of the test content 
and methodology (which represented the curriculum and irmovations in the 
curriculum), then he or she would ideally change and adjust or align some 
instruction with general concepts represented in the test. In these situations, 
elements are S 5 mchronized and this would be positive washback. The test 
results would give the teacher information on student achievement in the 
program. Therefore, integrating the test's concepts and procedures into the 
instruction would mean working with students on the abilities they are 
expected to learn. 

On the other hand, if the external test's content and procedures do not 
represent the curriculum well, then there is a problem (assuming that the 
teacher is teaching the curriculum). The teacher might abandon the cur- 
riculum to prepare the students for an unrelated test. This would be negative 
washback. In this situation, the test is not serving as an evaluative or assess- 
ment tool for the course content. It is not testing what the teacher has been 
teaching, which is the curriculum. Instead, it is evaluating something else. 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


57 



which does not give the teacher information on whether the students have 
achieved or are making progress concerning the curriculum. The ideal situa- 
tion in an educational system is where the curriculum, teaching, and testing 
are synchronized, and teachers (and other stakeholders) work for positive 
washback. Solomon (2002), in her book The Assessment Bridge, discusses 
positive ways fo link fesfs fo curriculum improvemenf. 

One musf realize fhaf fhe above is a simplistic explanafion of washback af 
fhe classroom level. As sfafed above, we are learning of ifs complexify. If is 
imporfanf, however, fo include in fhe discussion fhe voices of feachers who, 
along wifh sfudenfs, are af fhe grass roofs of experiencing fesf impacf af fhe 
classroom level. In fhe pasf, mosf reporfed feacher claims mainly concerned 
negative washback, for example, narrowing of fhe curriculum, losf insfruc- 
fional time, reduced emphasis on skills fhaf require complex fhinking or 
problem-solving, and increases in fesfs scores wifhouf a corresponding rise 
in fhe abilify of fhe consfrucf being fesfed (Alderson & Hamp-Lyons, 1996; 
Andrews, 2004; Barksdale-Ladd & Thomas, 2000; Firesfone, Fifz, & Broad- 
foof, 1999; Firm, 2000). Echoes from fhe pasf (Frederiksen, 1984) remind us 
fhaf efficienf fesfs (e.g., mulfiple-choice formaf) fend fo drive ouf less efficienf 
fesfs (e.g., essays, open-ended inferview quesfions, performance-based fesfs) 
leaving imporfanf abilifies unfesfed and unfaughf. Many years ago, fhere 
were calls for educators and fhose involved in fesf consfrucfion fo develop 
evaluation insfrumenfs fhaf would better represenf education goals and fo 
use fhese insfrumenfs fo improve fhe learning process. Af presenf we sfill see 
such discussion. Pellegrino ef al. (2001), in fhe book Knowing What Students 
Know: The Science and Design of Educational Assessment, reiterate fhaf educa- 
fional assessmenf does nof exisf in isolafion, buf musf be aligned wifh cur- 
riculum and insfrucfion if if is fo supporf learning. Andrews (2004) sfafes fhaf 
recenfly "affenfion has increasingly been paid fo fhe possibilify of fuming fhe 
apparenfly powerful effecf of fesfs fo advanfage, and using if fo exerf a 
positive influence in supporf of curriculum irmovafion" (p. 39). Currenfly 
feachers are framed fo concepfualize fesfing and evaluafion procedures as 
fools fo monitor fheir sfudenfs' learning. They are encouraged when 
developing fheir own in-class insfrumenfs fo align fhem wifh whaf is being 
faughf. In fhis way, fhe assessmenf procedures serve as a progress or 
achievemenf indicafor. In fhis framework, feachers are offen also asked fo 
adminisfer high-sfakes fesfs developed exfemally fo fheir classrooms (e.g., 
end of year provincial exams). Is fhere evidence fhaf fhese high-sfakes fesfs 
represenf fhe infended curriculum and/or innovations being infegrafed info 
fhe curriculum so fhaf fhe educafion system can move ahead in synchroniza- 
fion? In ofher words, is fhere evidence of posifive washback? (See Pellegrino 
ef al., 2001, for furfher discussion on revisiting bofh classroom and high- 
sfakes assessmenf and how fo ensure fhaf bofh of fhese approaches fogefher 


58 


CAROLYN E. TURNER 



inform and enhance student achievement.) What is a teacher's professional 
role in this context? 

One way to begin to look at such a question is to seek the perspectives of 
teachers who are presently working in educational systems with high-stakes 
exams that are used to assess achievement and to support curriculum in- 
novations. Managing external exams has become a way of life for many 
teachers. Their ability to deal with them as part of their pedagogical experi- 
ence is rapidly becoming a professional criterion. Reports from the past 
describe in general a negative picture of teachers trying to cope. As teachers 
are trained and become more informed about assessment and the need for 
synchronization as discussed above, it is important to keep abreast of their 
perspectives. The rest of this article reports on a teacher survey that is an 
integral part of a larger study on washback at the secondary level concerning 
secondary ESL teachers and students in the French school system in the 
province of Quebec. The teacher survey begins to shed light on a positive 
washback story as teacher professionalism emerges in dealing with external 
high-stakes tests. 

Methodology 

Purpose and Research Questions 

The purpose of the teacher survey was to identify the perspectives or beliefs 
of teachers when a change in the educational system was introduced to them 
during a school year and then implemented in the end-of-year provincial 
exam of that same year. In other words, the goal was to explore their views 
about this situation and the consequences on their behavior and on class- 
room activity. The major research questions were: What do teachers do in 
their classrooms when a change in the educational system is introduced 
through an external high-stakes test? What do they feel is their professional 
responsibility in reacting to this method to promote curriculum reform? 
Specifically, the inquiries were to learn about teacher perspectives on how 
such an innovation affects: what teachers teach (content); how they teach 
(methodology; e.g.. Is it "business as usual" in your classroom? Do you 
integrate the new ideas into your teaching? Do your classroom teaching 
content and methodology change? Do your attitudes or beliefs change?). 

Population and Context 

This study was situated in Quebec, where English-as-a-second-language 
(ESL) is taught in the school system from grade 3 onward. The participants in 
the survey were 153 secondary 4 and 5 ESL teachers across Quebec. Detailed 
information about this sample population is found in Presentation and Discus- 
sion of Results below. 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


59 



At the end of high school in Quebec, provincial exams are administered in 
all mandatory subjects, which include ESL. Theses exams are prepared by 
teachers and consultants under the MEQ coordination. They are worth 50% 
of fhe final mark or grade for sfudenfs, so sfudenfs musf pass in order fo 
obfain fheir high school diploma. Under fhese circumsfances fhese exams are 
considered high-sfakes fesfs in Quebec. 

The Quebec educafion sysfem is presenfly undergoing a reform fhaf 
includes curriculum, organizafional, and responsibilify changes (Blais & 
Laurier, 2005). Emphasizing a consfrucfivisf approach, fhe curriculum is 
compefence-based. This is being carried ouf fhrough a decenfralizafion to- 
ward schools and communifies and a focus on fhe imporfance of feachers' 
professional judgmenf and sfudenf autonomy. Those involved wifh fhe ESL 
curriculum see fhis as an opporfunify fo focus on speaking abilify. Classroom 
insfrucfion and assessmenf of speaking abilify have evolved over fhe years, 
buf sfill remain a challenge for many feachers. Wifh Erench being fhe firsf 
language in fhe province, exposure fo English and fhe need fo speak English 
are limifed (wifh fhe exception of sections in fhe mefropolifan area of 
Monfreal). Due fo time and resource consfrainfs, some feachers do nof focus 
on pracficing and evaluating speaking in fhe classroom as reflecfed in fhe 
new developing curriculum and goals of fhe Minisfry of Educafion of 
Quebec (MEQ) in educational reform. In order fo generafe more speaking 
practice, fhe MEQ decided fo infroduce specific changes formally (i.e., in- 
novations) info fhe speaking section of fhe secondary provincial exam. The 
infenfion and hope was fhaf fhe changes in fhe exam would be one of several 
ways fo encourage and mofivafe feachers fo pracfice speaking acfivifies more 
often wifh fheir sfudenfs and fo use English fhroughouf fhe process. 

There were fhree disfincf irmovafions. The firsf was a new, empirically 
derived rating scale fhaf was fo be used as speaking assessmenf criferia (see 
Gouvernemenf du Quebec, 2004, for fhe revised scale; Turner, 2001a; Turner 
& Uphsur, 1996, for fhe general scale developmenf process). The MEQ puf 
fhe main emphasis on fhis new performance rafing criferia and fhe useful- 
ness of if for bofh feachers and sfudenfs, as if reflecfed fhe curriculum goals 
for speaking abilify. The second irmovafion infroduced was English-only 
exam insfrucfions (bofh wriffen and oral) as opposed fo insfrucfions in 
Erench. The fhird innovafion was a modified speaking assessmenf fask for- 
maf, fhaf is, a move from one-on-one interviews fo sfudenf group discus- 
sions. In addition, sfudenfs were allowed individual preparation time before 
fhe assessmenf fask. 

The speaking assessmenf fask is a group discussion involving fhree fo 
four sfudenfs. All sfudenfs are given insfrucfions in English bofh orally and 
wriffen. They are reminded fhaf during fhe discussion fhey are fo listen fo 
fheir peers, ask questions, and express fheir views. Each sfudenf chooses a 
card fhaf has a fopic wriffen on if (e.g., decorating my room, fhe secref fo 


60 


CAROLYN E. TURNER 



success, tattooing). They are given a five-minute period to prepare. Students 
take turns leading a discussion. They are to start by expressing their own 
views and/or knowledge on the topic. The other students are expected to 
react by agreeing or disagreeing, asking questions, and so forth. Each student 
in the group is given a turn to lead a discussion. During this process the 
teacher circulates and assesses the students individually using the new 
rating scale criteria. 

In order to facilitate the teachers in familiarizing fhemselves wifh fhis 
new aspecf of fhe curriculum (which was being implemenfed fhrough fhe 
exam), pre-exam actions were faken. Some examples are: groups of feachers 
were an infegral parf in developing, validating, and setting sfandards for fhe 
new speaking scale; and workshops, CD-ROMs and written maferials on 
insfrucfional sfrafegies were provided fo feachers abouf fhe use of insfruc- 
fions in English, speaking group fasks, and how fo use fhe new speaking 
scale wifh aufhenfic samples. 

Instruments 

The insfrumenf used in fhe survey was a quesfiormaire composed of two 
parts (see Appendix). Part 1 asked for background informafion fo help de- 
scribe fhe population, and Parf 2 asked for feachers' views specifically re- 
lafed fo fhe fhree irmovafions infroduced info fhe speaking secfion of fhe 
provincial ESL exam as mentioned above (firsf irmovafion, fhe rafing scale — 
ifems #2 fhrough #8; second irmovafion, English-only insf ructions — ifems #9 
and #10; and fhird irmovafion, modified speaking assessmenf fask — ifems #1 
and #11). If also asked general questions on washback beliefs (ifems #12 and 
#13). The scale used in Parf 2 was a Likerf scale ranging from l=sfrongly 
disagree fo 4=sfrongly agree, wifh fhe excepfion of fhe lasf fhree quesfions, 
which were open-ended (ifems #14 and #15 were local procedural quesfions; 
and ifem #16 was a washback-relafed question abouf speaking exam 
preparation). A 4-poinf scale was purposely used fo elicif disfincf views and 
fo eliminafe fhe ambiguify of "I don'f know" or "I don'f have a view." The 
quesfionnaire was developed and pilofed by fhe research feam for fhe 
specific purposes of fhis sfudy. 

Procedure 

Parficipanfs were recruifed fhrough provincial professional forums (i.e., fhe 
armual SPEAQ conference, la Sociefe pour la promotion de Tenseignemenf 
de Tanglais, language second, au Quebec; SPEAQ's inferesf sections; and 
SPEAQ's newsletter). Informafion abouf fhe sfudy, including efhical proce- 
dures for such survey research, was communicafed fo fhe parficipanfs. The 
quesfionnaire was anonymous and was adminisfered affer fhe provincial 
speaking exam had faken place; feachers filled if ouf individually. 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


61 



Data Analysis 

The data from the questionnaire were analyzed using two methods. SPSS 12 
was employed for frequency counts (percentages) of the Likert scale ques- 
tions and for descriptive statistics of the same questions. The added written 
comments for each question were reviewed to help interpret the numbers. 
Due to the quantity of comments, an analysis for each question following 
guidelines as summarized in Tesch's (1990) 10 principles of interp rotational 
analysis was carried out. Comments representing the main patterns are 
reported in the results. The comments for Q12 and Q13 (i.e., exam affects or 
should affect teaching and learning), however, were combined with the 
open-ended question Q16 (i.e., teacher preparation for the speaking exam) 
for analysis. This was done because the three questions generated comments 
with much common content and repetition. The qualitative software 
NUD^IST 4 was used to help organize these data. Due to the overwhelming 
quantity of responses and to the fact that in general each response was in 
paragraph form containing several ideas, a qualitative analysis of the com- 
ments was carried out similar in nature to open coding as described in 
Strauss and Corbin (1998) and following guidelines as above (Tesch). Cate- 
gories were developed from the comments and coded (Bogdan & Biklen, 
1998; Marshall & Rossman, 1989). Patterns or themes were identified. 1 con- 
ducted the initial analysis, and another member of the research team, a 
research assistant, did an independent analysis. Similar categories resulted, 
but some labeling of the categories differed. Through discussion a consensus 
was reached as to the wording. 

Presentation and Discussion of Results 

Participants' Background Details 

As stated above, the participants were 153 secondary 4 and 5 ESL teachers in 
Quebec. Part 1 of the questionnaire revealed that they were all situated in the 
French school system. Sixty-one percent were female and 39% were male. All 
participants had BEd degrees, and 3% had MA degrees. All but 4% had had 
specific ESL training. All but 6% had either taken courses or been involved in 
workshops on testing and evaluation. There were novice and veteran teach- 
ers alike, distributed across four age categories (16% were 20-29 years old; 
32% were 30-39; 27% were 40-49; and 25% were over 50). Their teaching 
experience was distributed across four categories (5% had been teaching for 
0-2 years; 16% for 3-6 years; 20% for 7-10 years; and the majority 50% for 11 
or more years). The first language for 76% of the teachers was French, for 20% 
was English, and for 4% was other. The teachers came from nine regions 
across Quebec with the highest representation coming from Central Quebec 
(26%) and the Eastern Townships (in southern Quebec) (20%), and the lowest 
representation from Montreal (6%) and James Bay/ Northern Quebec (4%). 


62 


CAROLYN E. TURNER 



Teachers' views relating to innovations introduced into the speaking section 
of the provincial ESL exam: Perspectives from professionals 
As described above, the three new elements implemented in the speaking 
section of the exam were the rating scale, English (only) instructions, and 
group discussion tasks with preparation time. Using the definition of wash- 
back given above, feachers' percepfions were analyzed fo seek evidence of 
fhe influence of fhe new speaking exam componenfs af fhe classroom level. 
As we know, feacher views, percepfions, and beliefs are complex consfrucfs. 
To help gain insighf info fhe dafa, bofh fhe quanfifafive and qualifafive 
quesfionnaire dafa are presenfed and discussed fogefher so as fo provide an 
inferprefafive profile. Rafher fhan lengfhy descripfions of whaf feachers 
wrofe, direcf quofafions represenf feachers' voices. The quofafions were 
viewed as represenfafive of fhe main paffems discovered fhrough fhe dafa 
analysis. Table 1 summarizes fhe raw dafa by presenfing fhe percenfage of 
feachers responding in each cafegory on fhe 4-poinf Likerf scale. For reporf- 
ing purposes, cafegories 1 and 2 (sfrongly disagree/ disagree) were combined 
(i.e., collapsed) info one general cafegory of disagreement, and cafegories 3 
and 4 (agree /sfrongly agree) became one general cafegory of agreement. For 
fhis sfudy and sample size, fhe differenf levels of agreemenf and disagree- 
menf were viewed as being less useful for discussion. Specific levels are only 
mentioned when perfinenf . Table 2 views fhe dafa fhrough descripfive sfafis- 
fics. 

The feachers agreed fhaf fhe group discussion formaf appeared fo be an 
appropriafe indicafor of sfudenfs' speaking abilify (Ql) and fhaf fhe new 


Table 1 

Teachers’ Responses: Frequency Counts in Percentages (n=153) 


Question 

1=S 

Disagree 

2= Disagree 
Agree 

3=Agree 

4=S 

1-Exam tasks appropriate indicators 

0% 

9% 

75% 

16% 

2-Scale accurately measured 

2% 

9% 

64% 

25% 

3-Felt comfortable using scale 

0% 

9% 

42% 

49% 

4-Practiced using scale 

5% 

2% 

31% 

62% 

5-Scale changed my thinking 

13% 

42% 

40% 

5% 

6-Scale changed my teaching 

16% 

48% 

29% 

7% 

7-Explained scale to students 

5% 

2% 

37% 

56% 

8-Students used scale 

11% 

30% 

31% 

28% 

9-lncreased English instructions 

33% 

17% 

24% 

26% 

10-English instructions problematic 

36% 

43% 

16% 

5% 

1 1 -Speaking tasks increased 

20% 

29% 

39% 

12% 

12-Exam affects teaching/learning 

2% 

26% 

61% 

11% 

13-Exam should affect teaching/learning 

11% 

34% 

46% 

9% 


TESL CANADA JOURMU REVUE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


63 





Table 2 

Teachers’ Responses: Descriptive Statistics (n=153)* 


Question 

Min. 

Max. 

Mean 

SD 

1-Exam tasks appropriate indicators 

2 

4 

3.07 

.50 

2-Scaie accurateiy measured 

1 

4 

3.11 

.65 

3-Feit comfortabie using scaie 

2 

4 

3.40 

.66 

4-Practiced using scaie 

1 

4 

3.51 

.76 

5-Scaie changed my thinking 

1 

4 

2.36 

.77 

6-Scaie changed my teaching 

1 

4 

2.27 

.82 

7-Expiained scaie to students 

1 

4 

3.44 

.77 

8-Students used scaie 

1 

4 

2.76 

.99 

9-increased Engiish instructions 

1 

4 

2.43 

1.21 

10-Engiish instructions probiematic 

1 

4 

1.90 

.85 

1 1 -Speaking tasks increased 

1 

4 

2.44 

.95 

12-Exam affects teaching/iearning 

1 

4 

2.80 

.65 

13-Exam shouid affect teaching/iearning 

1 

4 

2.52 

.82 


•Likert Scale: 1 =Strongly Disagree, 2=Disagree, 3=Agree, 4=Strongly Agree. 


scale helped accurately measure students' ability (Q2). From Table 1, we see 
a similar pattern in responses. For Ql, 91% of the teachers agreed and for Q2, 
89% agreed when cafegories 3 and 4 are collapsed info one cafegory. The 
mean and sfandard deviafion in Table 2 indicafe fhis also, 3.07(.50) and 
3.11(.65). The combined commenfs demonsfrafed feachers' knowledge of fhe 
"mefhod effecf" (Bachman, 1990), fhaf is, fhe effecfs fhaf "fask charac- 
ferisfics" (Bachman & Palmer, 1996) including fhe rafing sysfem may have on 
sfudenf performance; and feacher knowledge fhaf sfudenf familiarify wifh 
fhe fask formaf may help enhance sfudenf performance: 

Good as long as fhe fopic is relevanf fo fhe sfudenfs' realify. 

Good in general, buf if depends on ofher factors, e.g., as long as fhe 
sfudenfs are comforfable wifh fhe ofher group members. If needs fo be 
aufhenfic. 

I like fhe group formaf for fhe exam wifh everybody falking af fhe same 
fime because sfudenfs do nof feel like fhey are being wafched and are 
less shy. 

Nexf fime, I will sfarf using fhis type of fask earlier in fhe ferm because 
if is beneficial fo sfudenfs. 

Some of fhe fopics are too absfracf and difficulf for fhe lower specfrum 
of sfudenfs. 

[Wifh fhe scale] if is much easier fo give an appropriate evaluafion now. 
We don'f have fo "guess" anymore. 


64 


CAROLYN E. TURNER 





Teachers indicated that they felt comfortable using the speaking scale 
(Q3) and took time to practice using it in their classrooms (Q4). When 
collapsing categories 3 and 4, teacher agreement was 91% and 93% respec- 
tively. Table 1 shows that category 4 (strongly agree) obtained the highest 
percentage of responses for bofh questions. The means in Table 2 fall approx- 
imafely in fhe middle of cafegories 3 and 4. The commenfs reveal fhaf 
feachers took advanfage of fhe information or framing sessions fhaf were sef 
up for fhem and felf confidenf going back to fheir classrooms and practicing 
using fhe scale. 

Our school board allowed for a complefe inifiafion of how fhe new scale 
works. I feel much more comforfable wifh fhis scale fhan fhe one before. 

I was able to go back fo fhe classroom and practice a lof and integrate if 
info my evaluafion sysfem. 

Teachers who did nof have fhe opporfunify fo practice (Q4) expressed fheir 
frusfrafion in nof being able fo do so and blamed if on lack of fime. 

I had no fime, buf would like fo have more fime fo lef sfudenfs 
parficipafe in speaking activities and use fhe scale. 

There was less agreemenf on whefher fhe new speaking scale changed 
teachers' ways of fhinking abouf assessmenf (Q5) and changed fheir feaching 
practices (Q6). When collapsing cafegories 1 (sfrongly disagree) and 2 (dis- 
agree), feacher disagreemenf wifh fhe sfafemenfs was 55% and 64% respec- 
tively. If musf be noted, however, fhaf mosf responses were found in fhe 
middle of fhe scale, cafegories 2 (disagree) and 3 (agree). This is reflecfed in 
fhe means and sfandard deviafions in Table 2: Q5, 2.36 (.77) and Q6, 2.27(.82). 
The commenfs provided insighf info fhe variation of views and also pro- 
vided feachers' professional sfances on fheir own feaching. 

If [fhe scale] helped organize my fhinking and helped me fo mark more 
fairly. 

If fook fhe "self-inferprefafion" ouf of if. 

The scale is a much better fool for assessmenf and if changed fhe way I 
listen fo sfudenfs. I focused more on accuracy and whefher fhe sfudenf's 
discourse was developed and supporfed. I like fhe flow charf aspecf. 
Didn'f really change my fhinking. Jusf fhaf I was now able fo more 
fairly mark. 

Did nof change my way of fhinking, buf I used fo modify fhe old rating 
scale, placing sfudenfs between levels; fherefore fhis scale is 
accommodafing. If asks easy-fo-answer quesfions abouf fhe sfudenfs' 
abilify. 

If didn'f change my feaching, buf if confirmed whaf I already believed 
abouf if. 

Near fhe end of fhe year I had fo sfarf using if fo gef fhe sfudenfs ready. 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


65 



for them to get familiar with the approach. So it changed my teaching in 
some ways. It's a good way to evaluate even though it is not easy. 

Teachers reported that they did explain the new scale to their students 
(Q7) with 93% agreement when collapsing categories 3 and 4 together in 
Table 1. Table 2 shows a mean and standard deviation of 3.44 (.77). Their 
comments: 

It reassures them [the students] as the final exam seemed to make them 
a bit more nervous than other oral productions in the year. 

Yes, I gave them all a copy. 

With the CD provided by the MEQ and a copy of the grid [scale] it was 
easy to go over it with the students. 

There was less agreement on whether students had the opportunity to use 
the speaking scale themselves (Q8). Table 1 indicates that responses were 
mainly spread across categories 2, 3, and 4, and Table 2 shows a mean of 2.76 
(.99). Comments revealed that teachers' beliefs fell into two areas: (a) stu- 
dents not being able to use such an instrument in that they carmot recognize 
their own errors; and (b) students should try using the scale to be more aware 
of what they are being evaluated on. 

I don't feel students are ready for that, and they would be too hard on 
themselves. 

I think it's a tool for the teacher, but I explain it to the students. 

It is too tough for students because they aren't aware of their own 
mistakes. 

The students all had a copy and had to evaluate themselves. I like to 
compare theirs to mine. 

Yes, it's important, but many of my student's failed to see errors in their 
own speech. 

Q9 and QIO dealt with the new component of having all instructions on 
the speaking exam in English. Q9 sought to find out if teachers increased 
their use of English instructions in the classroom as the exam neared, and 
QIO sought views on whether the use of English instructions was 
problematic. Eor Q9, Table 1 indicates an even split between agreement and 
disagreement (50%, 50%) if one combines categories 1 and 2, and then com- 
bines 3 and 4, and Table 2 shows a mean and standard deviation of 2.43 
(1.21). The comments help make sense of the responses. A portion of the 
teachers did not increase English instructions because they already con- 
ducted their classes in English: "No, because I do everything in English 
an 5 Tway." Others apparently did conduct their classes in Erench and 
responded that yes, they did increase exposing their students to English 
instructions as the exam neared: "Yes English instructions increased in my 
classroom as the exam approached, but I still sometimes switched to Erench 


66 


CAROLYN E. TURNER 



to ensure that every student understood and to avoid repeating." At the 
same time in QIO, although Table 2 indicates that 79% (when combining 
categories 1 and 2) did not find the use of English insfrucfions on fhe exam 
problematic, fhey expressed concern abouf fhe weaker sfudenfs, buf also fell 
fhaf sfudenfs in general would now have fo make an efforf fo read fhe 
English. 

Yes and No, my sfudenfs are used fo wrifing fesfs where all insfrucfions 
are in English. Buf fhe weaker sfudenfs will miss nof having me in fhe 
room fo explain fhe odd word in Erench. 

The weaker sfudenfs had some difficulfy in undersfanding some parfs, 
nof sure of fhemselves. 

Only my weaker sfudenfs, because before I would give individual 
insfrucfions in Erench for people who were way, way losf. 

I found fhaf fhe sfudenfs had always relied on fhe Erench franslafion on 
fhe exam. Wifh fhem now in English, I found fhey paid more affenfion 
fo fhe insfrucfions, fherefore, made fewer "sfupid" errors. 

No, if was fime fo do fhis. Greaf idea! I always proceeded in English 
anyway, buf I know fhaf in fhe pasf exams having insfrucfions in Erench 
reduced anxiefy levels. 

The responses fo Qll (i.e., speaking fasks similar fo fhe exam increased as 
fhe exam neared) were nearly evenly splif between agreement and disagree- 
ment, with 49% in categories 1 and 2 and 51% in categories 3 and 4. Table 2 
shows a mean and standard deviation of 2.44 (.95). Once again, fhe com- 
menfs provided insighf in much fhe same way as in Q9. A portion of fhe 
feachers did nof increase such fasks in fheir classrooms because fhey were 
already an infegral parf of fheir feaching: "No, because fhey already have 
similar activities af regular infervals fhroughouf fhe year." Anofher porfion 
who also responded in cafegories 1 and 2 did nof feel fhey had fhe fime, buf 
if did nof matter because fhey encouraged speaking in English all fhe fime 
anyway. 

I didn'f conducf any special preparation. Nofhing like fhaf, no. I was 
always frying fo encourage class discussions or elicif answers in 
English, buf fo spend fhaf amounf of fime on acfivifies similar fo fhe 
exam, fhere jusf isn'f fime for if. 

Those who responded in cafegories 3 and 4 did increase speaking fasks as fhe 
exam approached. Several of fhe commenfs revealed fhaf fhey did fhis fo 
help fhe sfudenfs feel comforfable wifh fhe formaf of fhe fasks. 

Yes, I increased fhe fasks despife fhe facf fhaf fhe sfudenfs are in f earns 
all year. 1 follow procedures a little more, fo make fhem more 
comforfable. 

I would nof have focused so much on fhe fasks fhaf required sfudenfs fo 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


67 



talk in groups ... if they had not been on the speaking exam, i had to 
allow students to practice. Overall, I believe this was good. 

Teacher views on washback: The impact of the provincial speaking exam 
In analyzing the comments from Q12 (i.e., exam affects teaching-learning), 
Q13 (i.e., exam should affect teaching-learning), and the open-ended question 
Q16 (teacher preparation for the speaking exam), it became apparent that 
there was much overlap and in reality much repetition. The decision was 
made to combine all the comments and do a qualitative analysis to reveal 
patterns or themes in the data (see Data Analysis). The teachers had much to 
say and in many cases cormected their comments in the three questions by 
making cross references. Results from Q12 and Q13 in Tables 1 and 2 show a 
distinction between teachers' views on whether the exam does affect teach- 
ing-learning and whether it should, with 72% in agreement with the former 
and 55% in agreement with the latter (when combining categories 3 and 4). 
The respective means and standard deviations are 2.80 (.65) and 2,52 (.82). In 
each question, however, the highest percentage is found in category 3 
(agree). It was by analyzing the comments related to the two questions, in 
addition to the comments in the open-ended Q16, that provided a window 
into teachers' views related to washback. 

The categories generated from the three questions are presented in Table 
3. 

The most salient theme that emerged from the data was teachers' aware- 
ness of the importance of the link between teaching and assessment. This 
recurring theme is articulated in the following comment. 

As a student teacher, I used to believe that evaluation practices should 
not drive teaching, but now [as a teacher] not only do I realize how 
much they do, but that the evaluation practices should reflect what 
students have been taught in the classroom. There is a connection 
between this and the provincial exam. 

Other themes were related to strategies in the classroom in relation to the 
provincial exam. It appears that teachers felt that aligning classroom practice 
with the exam construct (i.e., speaking with peers) was important, but they 
took this alignment to mean different things. Some took it literally and felt 

Table 3 

Teachers’ Perspectives on Washback: 

Categories Generated From Q12, Q13, and Q16 


• Awareness of link between teaching and assessment 

• Aligning teaching practice with exam task characteristics 

• Actual classroom strategies 

• No special strategies 


68 


CAROLYN E. TURNER 





restricted, whereas others took a broader approach. Some felt obligated to 
practice the exact format of fhe fask wifh perfinenf vocabulary for expressing 
opinion and asking quesfions (i.e., puf sfudenfs in groups and have fhem 
fake fums leading a discussion). Ofher feachers fell safisfied fhaf by en- 
couraging sfudenfs fo speak and conducting several speaking acfivifies 
fhroughouf fhe year, fhe sfudenfs would be sufficienfly prepared for fhe 
speaking provincial exam. The following commenfs reflecf fhese various 
professional sfances. 

I believe evaluafion procedures should reflecf whaf has been faughf in 
class. So if we sfick fo fhe program objecfives, fhen fhe sfudenfs should 
be okay on fhe final exam . . . buf I sometimes would like fo do 
"differenf" sfuff. 

I agree since evaluafion equals fhe objecfives. I disagree since sometimes 
we miss fhe poinf wanting fo fill fhe needs for evaluafion. 

I agree, buf unforfunafely nof enough imporfance is puf on formative 
evaluafion. Passing fhe final exam is nof an end in ifself . Becoming 
compefenf in your second language should be fhe goal. The whole 
evaluafion sysfem including classroom evaluafion should work fogefher. 
I prepare my sfudenfs fhroughouf fhe ferm for fhe speaking section of 
fhe exam by giving fhem acfivifies similar fo fhe exam. 

Jusf before fhe exam, I gave my sfudenfs a handouf wifh written 
samples from fhe feacher's booklef. I also added lisfs of useful 
expressions so fhey had somefhing fhey could review before fhe exam. 
We prepared before hand by brainstorming various subjecfs (fhrough a 
cooperafive approach). We looked af perfinenf vocabulary, verb fenses, 
key expressions for expressing opinions and possible quesfions fhaf 
could be asked. This was greafly appreciated by fhe sfudenfs. 

I didn'f prepare in any special way, because we do regular differenf 
speaking acfivifies fhroughouf fhe whole year, real daily expressions, 
debafes, efc. 

I didn'f specifically prepare fhem ofher fhan a previous speaking exam. 
Throughouf fhe year, fhey have speaking acfivifies, and fhe "final" 
becomes in my opinion fhe "final acfivify." 

A final fheme in fhe dafa was feachers' concern in helping and supporting 
sfudenfs fo perform well. This is nafurally related fo fhe sfrafegy fhemes 
above, buf if wenf a sfep farfher in illusfrafing feachers' desire fo aid fheir 
learners. 

As a teacher, I wanf my sfudenfs fo do well on fhe exam so if is 
imporfanf fhaf 1 prepare fhem for fhis. 

Yes, I spenf a lof of fime preparing my sfudenfs ... I wanf fo make sure 
fhey are nof surprised when fhey see fhe exam. Evaluafion means a lof 
fo fhem. Sfudenfs offen say, "does if counf?" 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


69 



I want my students to speak in English all the time, so that the exam 

will just be like another conversation, and so that nervousness, mood, 

etc. will not interfere with their speaking performance. 

Throughouf fhe dafa analysis, as fhe researcher I was grafeful fo fhe 
feachers for how fhey provided commenfs. Their arficulafe responses aided 
in inferprefing fheir views. 

Discussion 

In fhe confexf of a larger washback sfudy, fhe purpose of fhis survey was fo 
identify fhe perspecfives or beliefs of feachers when a change in fhe educa- 
fional sysfem (i.e., ESL speaking pracfice) was infroduced during a school 
year and fhen implemenfed in fhe end-of-year provincial exam. The inquiry 
was fo examine posifive or negafive washback as seen fhrough fhe lenses of 
feachers. The survey resulfs provide a window info a sifuafion fhaf emerged 
as much more complex. Rafher fhan simply embracing or rejecting infro- 
duced changes, feachers appear fo have infegrafed fhem info fheir feaching 
or assessmenf pracfice according fo fheir own beliefs and professional sfan- 
ces. Their reactions seem fo reflecf fhaf fhis was all parf of a day's work — parf 
of fheir professional reperfoire. We learned fhaf fhrough experience and/or 
formal education fhese feachers displayed knowledge abouf many imporfanf 
elemenfs fhaf abound in fhe language fesfing and assessmenf liferafure abouf 
educational confexfs: fhe effecf of mefhod including scoring on sfudenf per- 
formance (Bachman, 1990, Bachman & Palmer, 1996); fhe effecf of sfudenf 
familiarify wifh fask fype and scoring criferia (Genesee & Upshur, 1996; 
Arfer & McTighe, 2001); fhe imporfance of linking curriculum, feaching, and 
assessmenf (whefher fhe latter be classroom-based or exfemal high-sfakes 
fesf, Pellegrino ef al., 2001; Solomon, 2000); and fhe undersfanding fhaf if is af 
fhe classroom level where feaching and learning occur and fhaf formative 
evaluafion af fhis level has an imporfanf and differenf role fhan high-sfakes 
provincial exams (James & Gipps, 1998). We learned fhaf fhe feachers 
believed fhaf fhey aligned fheir relafed classroom pracfice wifh fhe new 
elemenfs of fhe provincial speaking exam. This was done in varying ways 
confingenf on fhe currenf sfafe of affairs in fheir respective ESL classrooms. 
They expressed fhe belief fhaf fhe sysfem (bofh classroom and provincial 
levels) should work fogefher. 

The findings do nof indicafe whaf has been fhe general frend in fhe 
liferafure abouf feachers' reporfs, fhaf is, a negafive washback sfory from 
feachers' perspecfives. Insfead, in fhis survey a more posifive washback 
confexf has emerged. Alfhough some sfudies have reporfed feachers' posi- 
five affifudes in relafion fo some aspecfs of high-sfakes fesfs (Gheng, 2004), 
and ofher liferafure has alluded fo fhis pofenfial (Andrew, 2004) or discussed 
solutions fo creafe such a confexf (Solomon, 2000), few sfudies have reporfed 


70 


CAROLYN E. TURNER 



on and woven together a profile such as the one in this study. Possibilities for 
fhis variation may be affribufed fo fhe feachers' sfances and perspecfives 
concerning irmovafions as found in fhis population of feachers. As Fullan 
and Sfiegelbauer (1991) observe. 

If we know one fhing abouf irmovafion and reform, if is fhaf if carmof be 
done successfully fo ofhers. If is nof as if we have a choice whefher fo 
change or nof. Demands for change will always be wifh us in complex 
societies; fhe only fruitful way ahead is fo carve ouf our own niche of 
renewal and build on if. (p. xiv) 

Alfhough fhe irmovafions in fhis sfudy were imposed fhrough fhe provin- 
cial exam, fhey were acfually infended curriculum and mefhodological chan- 
ges in educational reform (and fhe MEQ invifed feacher parficipafion in 
aspecfs of fheir developmenf). The dafa demonsfrafe fhaf fhe feachers ap- 
peared fo view fhem as such and infegrafed fhem info fheir feaching and 
assessmenf pracfice. 

Alfhough fhe resulfs from fhis survey appear fo reflecf an image of 
posifive washback, if appears fhaf fhere is a need fo revisif whaf positive 
washback mighf mean in fhis confexf. In earlier liferafure, if is discussed as a 
pedagogical phenomenon in which fhe various elemenfs of an educafional 
sysfem (curriculum, feaching, assessmenf) move foward synchronizafion 
when changes are infroduced fhrough a high-sfakes exam (i.e., irmovafion 
fheory. Wall, 1999, 2000). This survey has provided insighf info fhe nafure of 
how fhis mighf fake place from feachers' perspecfives. Teachers may or may 
nof embrace fhe changes, buf fhey cope wifh fhem as parf of fheir work and 
infegrafe fhem info fheir feaching pracfice. In fhe process, fhey express fheir 
views as fo fhe nafure of fhe changes. Teachers appear fo wanf fo do fheir 
parf in moving fhe sysfem info a posifion where curriculum, fheir feaching 
and assessmenf, and fhe sysfem's high-sfakes exam correspond. They have 
done fhis according fo fheir beliefs and professional sfances, which in fhe end 
may nof presenf a unified performance across feachers. If does, however, 
demonsfrafe influence from fhe final provincial exam on feachers' percep- 
tions of fheir behavior. 

Conclusion 

The resulfs of fhis survey bring us back fo fhe begirming of fhis arficle and fhe 
discussion abouf washback, professionalism and innovafion fheory. The 
feachers here expressed fhe will fo move ahead wifh changes fhaf were 
infroduced fhrough fhe speaking exam and fo move foward a synchroniza- 
fion of curriculum, feaching, and assessmenf in general. In fhis professional 
sfance, if became apparenf, however, fhaf fhey sfruggled af fimes wifh fac- 
tors poinfing foward a need for beffer alignmenf between assessments used 
for differenf purposes (classroom-based assessmenf and high-sfakes provin- 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


71 



cial exams). The teacher perspectives and stances that emerged contribute to 
the multifaceted concept of professionalism. 

Wifh enhanced sfudenf learning as fhe infended goal, more efforfs and 
research are needed in order fhaf assessmenfs af all levels work fogefher in a 
sysfem fhaf is comprehensive, coherenf, and confinual (Pellegrino ef al., 
2001). The role fhaf feachers can play af fhe classroom level is revealed in fhe 
professional sfances fhaf emerged. This is yef anofher indicafion of fhe pivo- 
fal role fhaf feachers can play in our educafional confexfs. 

Acknowledgments 

This project was funded by a grant from the Social Sciences and Humanities Research Council 
(SSHRC), and in addition by SPEAQ (la Societe pour la promotion de Tenseignement de 
Tanglais, language second, au Quebec). I thank my research assistants, Yvonne Christiansen, 
Christian Colby-Kelly, Tim Dougherty, Kerry Hatsipantos, and Christopher Sikorsky; and 
Catherine MacDonald and Elizabeth Johnston at the MEQ for their assistance. I also express my 
appreciation to all the contributing teachers who took the time to provide rich data. Also, thanks 
to the anonymous reviewers for their useful comments. 

The Author 

Carolyn E. Turner is an associate professor and Director of Graduate Programs in the Depart- 
ment of Integrated Studies in Education at McGill University. Her main focus and commitment 
are language assessment and testing in educational settings. She pursues these through her 
teaching, research, and service. She has published in journals such as Language Testing, TESOL 
Quarterly, and the Canadian Modern Language Review. She is currently Associate Editor of Lan- 
guage Assessment Quarterly. 

References 

Alderson, J.C., & L. Hamp-Lyons. (1996). TOEEL preparation courses: A study of washback. 
Language Testing 13, ISO-297. 

Alderson, J.C., & Wall, D. (1993). Does washback exist? Applied Linguistics 14, 115-129. 

Andrews, S. (2004). Washback and curriculum innovation. In L. Cheng, Y. Watanabe, & A. 
Curtis (Eds.), Washback in language testing: Research contexts and methods (pp. 37-50). 
Mahwah, NJ: Erlbaum. 

Arter, ]., & McTighe, J. (2001). Scoring rubrics in the classroom. Thousand Oaks, CA: Corwin 
Press. 

Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford 
University Press. 

Bachman, L.F., & Palmer, A.S. (1996). Language testing in practice. Oxford, UK: Oxford 
University Press. 

Bailey, K.M. (1996). Working for washback: A review of the washback concept in language 
testing. Language Testing, 13, 257-279. 

Barksdale-Ladd, M.A., & Thomas, K.F. (2000). What's at stake in high-stakes testing: Teachers 
and parents speak out. Journal of Teacher Education, 51, 384-397. 

Blais, J.G., & Laurier, M. (2005, April). Accountability and standardized testing in Quebec and some 
neighbouring US states. Paper presented at the annual meeting of the American Educational 
Research Association, Montreal. 

Bogdan, R., & Biklen, S. (1998). Qualitative research for education. Cambridge, UK: Cambridge 
Unversity Press. 


72 


CAROLYN E. TURNER 



Cheng, L. (2004). The washback effect of a public examination change on teachers' perceptions 
toward their classroom teaching. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.), Washback in 
language testing: Research contexts and method (pp. 147-170). Mahwah, NJ: Erlbaum. 

Cheng, L., Watanabe, Y., & Curtis, A. (Eds.). (2004). Washback in language testing: Research 
contexts and methods. Mahwah, NJ: Erlbaum. 

Englund, T. (1996). Are professional teachers a good thing? In I.F. Goodson & A. Hargreaves 
(Eds.), Teachers' professional lives (pp. 75-87). London: Ealmer Press. 

Eirestone, W.A., Eitz, ]., & Broadfoot, P. (1999). Power, learning and legitimation: Assessment 
implementation across levels in the United States and the United Kingdom. American 
Educational Research Journal, 36, 759-793. 

Fox, J. (2004, October). Language test impact: Practices and possibilities. Paper presented at the 
meeting of the Midwestern Association of Language Testers, Cleveland. 

Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. 
American Psychologist 39, 193-202. 

Fullan, M.G., & Stiegelbauer, S. (1991). The new meaning of educational change (2nd ed.). New 
York: Teachers College Press. 

Genesee, F., & Upshur, J.A. (1996). Classroom-based evaluation in second language education. 
Cambridge, UK; New York: Cambridge University Press. 

Gouvernement du Quebec, Ministere de TEducation. (2004). Document d' Information: Epreuves 
Uniques, Anglais, Langue Seconde, de quatrieme et cinquieme annee de secondaire 156-444 et 
156-544 (No. 16-7105-05, 16-7181-05). Retrieved October 15, 2004, from 
http:/ / www.meq.gouv.qc.ca/dgfj /de/docinfosec.htm 

Hedgcock, J.S. (2002). Toward a socioliterate approach to second language teacher education. 
Modern Language Journal, 86, 299-317. 

Henrichsen, L.E. (1989). Diffusion of innovations in English language teaching: The ELEC effort in 
Japan. New York: Greenwood Press. 

James, M., & Gipps, C. (1998). Broadening the basis of assessment to prevent the narrowing of 
learning. Curriculum Journal, 9, 285-297. 

Kumaravadivelu, B. (2003). Beyond methods: Macrostrategies for language. New Haven, CT: Yale 
University Press. 

Linn, R.L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4-16. 

Marshall, C., & Rossman , G.B. (1989). Designing qualitative research. Newbury Park, CA: Sage. 

Mathews, P., & Chuntian, C. (Eds.). (2004). Professionalism in teaching English as a second 
language (ESL) in Canada and abroad. TESL Canada Journal, 4, special issue No. 4, i-ii. 

McNamara, T. (1998). Policy and social considerations in language assessment. Annual Review 
of Applied Linguistics, 18, 304-309. 

Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241-256. 

Pellegrino, J.W, Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The 
science and design of educational assessment. Washington, DC: National Academy Press. 

Solomon, P.G. (2002). The assessment bridge: Positive ways to link tests to learning, standards, and 
curriculum improvement. Thousand Oaks, CA: Corwin Press. 

Stobart, G. (2003). The impact of assessment: Intended and unintended consequences. 
Assessment in Education, 16(2), 139-140. 

Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for 
developing grounded theory (2nd ed.). Thousand Oaks: Sage. 

Tesch, R. (1990). Qualitative research: Analysis types and software tools. New York: Palmer. 

Turner, C.E. (2001a). Developing an empirically based rating scale for evaluating speaking ability at 
the secondary 4 and 5 levels (Report). Quebec: Ministry of Education. 

Turner, C.E. (2001b). The need for impact studies of L2 performance testing and rating: 

Identifying areas of potential consequences at all levels of the testing cycle. In A. Brown, C. 
Elder, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K O'Loughlin (Eds.), 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL. 23, NO. 2, SPRING 2006 


73 



Experimenting with uncertainty: Language testing essays in honour of Alan Davies, Studies in 
Language Testing 11 (pp. 127-139). Cambridge, UK: Cambridge University Press. 

Turner, C.E. (2002, December). Investigating high-stakes test impact at the classroom level. Paper 
presented at the annual meeting of the Language Testing Colloquium, Hong Kong. 

Turner, C.E. (2005, May). Professionalism and the impact of high-stakes tests at the classroom level: 

The speaking component of the ESL Provincial Exam in Quebec. Paper presented at the TEST 
Canada Conference, Ottawa. 

Turner, C.E., & Upshur, J.A. (1996). Developing rating scales for the assessment of second 
language performance. In G. Wigglesworth & C. Elder (Eds.), Australian review of applied 
linguistics: Series S, No. 13. The language testing cycle: From inception to washback (pp. 55-79). 
Melbourne: ARAL. 

Wall, D. (1999). The impact of high-stakes examinations on classroom teaching: A case using insights 
from testing and innovation theory. Unpublished doctoral dissertation. University of 
Lancaster, UK. 

Wall, D. (2000). The impact of high-stakes testing on teaching and learning: Can this be 
predicted or controlled? System, 28, 499-509. 

Appendix (turner, 2004 , wb project) 

Final Teacher Questionnaire 

This information will help us understand better your impressions of the speaking section of the 
final provincial examination and its relation to teaching activities. All information will he 
treated in the strictest confidence. Thank you very much for your time. 

Part 1: Your Background Information 
Please check [ ] the appropriate answer. 

(1) Your gender; [ ] male [ ] female 

(2) Your age: [ ] 20-29 [ ] 30-39 [ ] 40-49 [ ] above 50 

(3) Your mother tongue: [ ] English [ ] French [ ] Other — Specify 

(4) Number of years you have been feaching: [ ] 0-2 years [ ] 3-6 years [ ] 7-10 years 
[ ] 11 years or more 

(5) Number of hours you teach ESL per week: 

[ ] 0-4 hours [ ] 4-10 hours [ ] 11 hours or more 

(5a) Levels you teach: [ ] Secondary 4 [ ] Secondary 5 [ ] Other, Specify 
(5b) Class fypes: [ ] Regular ESL [ ] ESLA [ ] Enriched [ ] Other, Specify 
Comments on (5, 5a, 5b): 

(6) Your academic backgroimd: [ ] Bachelors [ ] Bachelors plus Certificate [ ]Masters 

[ ] PhD[ ] other. Specify: 

(7) Do you have specific training in ESL? [ ] Yes [ ] No 

(8) Have you taken courses specifically in testing and evaluation? [ ] Yes [ ] No 

(9) Have you been involved in workshops focusing on testing/ evaluation? 

[ ]Yes [ ] No 

(10) Region you teach in: 

[ ] Montreal [ ] Montreal region (Laval, South Shore) [ ] Eastern Townships 
[ ] Laurentians 


74 


CAROLYN E. TURNER 



[ ] Quebec City [ ] Central Quebec (Mauricie, Charlevoix, Chaudiere regions) 

[ ] Saguenay-Lac-St-Jean region [ ] Western Quebec and Hull region 

[ ] Eastern Quebec (Gaspe region, Manicouagan, Duplessis and the Magdalen 

Islands) 

[ ] James Bay and Northern Quebec Specify city/municipality: 

Part 2: Speaking Evaluation 

In the brackets [ ], please mark the following on a four point scale as: 

[1] strongly disagree [2] disagree [3] agree [4] strongly agree 

(1) [ ] 1 believe the speaking activities on the final exam are an appropriate 
indicator of the student's ability. 

Comments: 

(2) [ ] 1 believe the new speaking scale for the final provincial examination 
accurately measured the speaking ability of my students. 

Comments: 

(3) [ ] 1 felt comfortable using the new speaking scale in the final provincial 
examination. 

Comments: 

(4) [ ] 1 had the opportimity to practice using the new speaking scale before final 
provincial speaking evaluation. 

Comments: 

(5) [ ] The new speaking scale changed my way of thinking about the assessment of 
my students. 

Comments: 

(6) [ ] The new speaking scale changed my teaching in some ways. 

Comments: 

(7) [ ] 1 had the opportunity to explain the new speaking scale to my students. 
Comments: 

(8) [ ] 1 had the opportunity to have my students use the new speaking scale 
themselves. 

Comments: 

(9) [ ] The amount and frequency of English instructions increased in my classroom 
as the final examination approached. 

Comments: 

(10) [ ] 1 felt having the instructions in English in the final provincial examination 
to be problematic for my students. 

Comments: 

(11) [ ] The amount and frequency of speaking tasks similar to the final speaking 
examination increased in my classroom as this examination approached. 

Comments: 

(12) [ ] 1 believe evaluation procedures drive (affect) teaching/leaming. 

Comments: 

(13) [ ] 1 believe evaluation procedures should drive (affect) teaching/learning. 


TESL CANADA JOURNAUflEWE TESL DU CANADA 
VOL 23, NO. 2, SPRING 2006 


75 



Comments: 

Please answer the following questions in your own words. 

(14) How many weeks after you received it did you administer the speaking section 
of the provincial examination? 

(15) What factors affected this timing? 

(16) Please comment below on whether you prepared your students for fhe 
speaking section of fhe provincial exam. If you did, please comment on how you 
prepared your students. 


76 


CAROLYN E. TURNER 



